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A  brief  discussion  of  attempts  to  apply  AVL  or  bounded  balance 
techniques  to  k-d  trees  will  also  be  found  in  this  paper  (based  on  generali¬ 
zations  of  AVL- 6 2  and  NR-73) .  Such  techniques  will  be  shown  to  be  inherently 
less  efficient  than  our  balanced  forests  of  k-d  trees,  x. 


^  ^  Our  discussion  will  be  primarily  theoretical.  Its  results  are 
significant  because  they  contain  the  best  combination  of  multidimensional 
retrieval  and  update  runtime  thus  far  derived  for  a  dynamic  data  structure 
which  occupies  0(N)  units  of  memory  space.  In  viev-  of  Rivest's  earlier 
conjecture  (Ri  76) ,  it  may  be  that  these  results  are  the  best  attainable 
without  substantially  expanding  memory  space. 

■  ^ 


SECURITY  CLASSIFICATION  OF  THIS  PAGEfHTian  Dmt»  Bnffd 


BALANCED  FORESTS  OF  K-D*  TREES  AS  «. 
DYNAMIC  DATA  STRUCTURE^ 


^This  research  was  supporced  in  part  by  the  Office  of  Naval  Research  under 
contract  N00014-76-C-0914. 


Submitted  November  27,  1978. 


Copyright  1978  by  Dan  E.  Willard 


All  rights  reserved.  No  part  of  this  publication  may  be  reproduced,  stored  in 
a  retrieval  system,  or  transmitted,  in  any  form  or  by  any  means,  electronic, 
mechanical,  photocopying,  recording,  or  otherwise,  without  the  prior  written 
permission  of  Dan  E.  Willard. 


INBORI.IATI  VZ  AB5TRA CT 


TJ  •  T  ‘  ,7»T>  r^rr.  j 

t  s-/ •  * 


fr,T5'T,~^Q  .1  C*  1  ^  V?7  3-^-  O  71  TTp  <*"»  ?  TT  77 


By  Dan  E.  Hillard 
Harvard  University 


Computer  Science  classification:  3.73 »  3. 74- 

Keywords :  multidimensional  searching,  Z-d  tree,  quad  tree. 

AVL  tree,  "bounded  balance  tree,  B-tree,  super-B-tree 


Following  3entle3’'s  suggestion  (2e-75 ) ,  this  paper  presents 
an  algorithm  that  optimises  the  worst-case  performance  of  a 
dynamically  changing  k-d  tree-like  structure.  Cur  proposed 

p 

algorithm  will  have  an  C(log  .7)  worst-case  insertion  and  deletic 
runtime.  It  will  insure  that  partial  match  and  region  queries 
can  be  performed  in  the  same  C(IT^“s/'a')  and  0(I;~”~//‘sC)  retrieval 
times  previously  attributed  to  k-d  trees  (Be-75,  1.7-77).  The 
coefficient  associated  with  our  dynamic  retrieval  algorithm 
will  be  only  slightly  larger  than  that  of  the  previous  static 
algorithms. 

The  data  structure  employed  here  -.7111  consist  of  a  forest 
of  trees  whose  members  are  slightly  modified  versions  of  k-d 
trees  designated  as  k-d*  trees.  The  salient  characteristic  cf 
thi3  forest  i3  that  the  number  and  height  of  its  trees  will  be 
sufficient!”  controlled  to  insure  efficient  worst-case  runtime. 


ABSTRACT  (2) 
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A  brief  discussion  of  attempts  to  apply  AVI  or  bounded 
balance  techniques  to  k-d  trees  will  also  be  found  in  this 
paper  (based  on  generalizations  of  AVI-62  and  ..P.-73 )  •  Such 
technicues  will  be  shown  to  be  inherently  less  efficient  than 
our  balanced  forests  of  k-d  trees. 

Our  discussion  vri.ll  be  primarily  theoretical.  Its  results 
ere  significant  because  they  contain  the  best  combination  of 
multidimensional  retrieval  and  update  runtime  thus  far  derived 
for  a  dynamic  data  structure  which  occupies  C ( IT )  units  of 
memory  space.  In  view  of  Rivest's  earlier  conjecture  (Hi  76), 
it  may  be  that  these  results  are  the  best  attainable  without 
substantially  expanding  memory  space. 
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Two  recent  articles  by  Bentley  £ze  IzJ  and  lee  and  7/ong  /T'.V  77/ 


have  shown  hew  partial  natch,  partial  region  and  total  region 

~l-s/k  ,  ,Tl-l/k 

queries  can  ce  peri  omen  m  respective  ..  ,  k.T  '  ana  s.. 


1-1/ 


worst-case  runtine  with  a  data  structure  calle-a  a  k-d  tree.  ..-i 
trees  are  very  attractive  because  they  occupy  only  C(27)  units  0: 


retrieve-  tire  01  r-a  trees  witreu. 


nerory  space.  There  is  currently  no  known  alternative  data  struotui 
which  occupies  C(27)  nerory  space  and  has  a  better  retrieval  tire 
than  k-d  trees.  Thus  the  pyrsrid  type  data  structures  of  Bentley- Bh 
/Ss  7]/ .  Lueker  /Tu  76,7  and  V/illard  /7i  7 67  were  able  to  attain 
better  retrieval  tires  than  k-d  trees  only  by  occupying  ncre 
than  0(17)  space. 

Rivest  has  conjectured  that  it  is  inpossible  to  ire rove  upon 
1— s/k 

the  0(21  '  )  partial  natch  retrieval  tire  of  k-d  trees  without 

substantially  expanding  nerory  space  /?i  767.  It  is  easy  to  show 
that  if  Rivest' s  conjecture  is  correct,  then  it  v/ill  be  equally 
inpossible  to  develop  a  data  structure  that  occupies  C  ( 27 )  space 
and  supports  region  and  partial  region  query  operations  in  less 
than  0(17  '  )  worst-case  tire.  Thus  Rivest' s  conjecture  irplies 

that  k-d  trees  have  the  best  nagnitude  of  retrieval  tire  which  is 
possible  for  a  data  structure  occupying  0(27)  space. 

The  discussion  in  the  earlier  articles  about  k-d  trees  was 
confined  to  tine  case  of  a  static  file.  The  purpose  of  this  paper 
will  be  to  generalise  the  previous  the orer.s  for  a  dynanic  envi r cu¬ 
re  r.t  .  Tine  two  rain  thecrers  of  tills  cater  will  state  that: 


•71 LU-O  — 


i)  none  of  the  theorems  of  Lentley,  Lee  and  ‘.Vcng  generalize 


satisfactorily  for  a 
adheres  to  the  data  $ 


.c  envarcnaer. : 


-  *  /*\  *■»  «  -V>r> 


a -a  trees, 


ii)  all  of  the  theorems  cf  Bentley,  lee  arid  long  generalise 
quite  ’.veil  for  a  dynamic  environment  if  one  utilises  -he 
slightly  different  data  structure  proposed  here. 


:..cre 


specifically,  the  proposed  " caiar.ee d  forest  cf  k-d*  tree 
vh.ll  possess  the  care  retrieval  arc.  memory  space  charac¬ 
teristics  as  h-d  trees,  ar.c  rill  additionally  support 
O(log  1')  -.vorst-case  record  insertion  and  deletion,  opera' 


The  ccr.ce: 


of  a  k-d  tree 
and  is  a  generalization  of  his  earlier  cor.ce; 
£t5  74,  2S  Tc/t  A  k-d  tree  is 
these  applications  ’where 


s  proposed  by  Bentley  in  Be  7 
of  a  quad  tree 
'e signed  for 


il  _  -  -  Cb  4.  S/  *  -  C_  ^  -A.  O  < 


several  distinct  keys  cf 
node,  v,  of  a  k-d  tree  is  assigned 


the  user  v.ishes  to  perform  queries  or. 

1  VTV  9  T~ry  -  r  *7^^  V,  ,•  w 

d.  oCO— ...CCi 


...U-  aV  1  1  V 


denoted  as  i„.  The  defining  characteristic  cf  a  k-d  tree 
v  = 


1}  all  descendants  cf  node  v  whose  i  -th  key  is  less  ■ 
the  i-.-th  key  of  v  rust  belong  to  v’s  left  subtree 
ii)  all  descendants  of  v  whose  i  -th  key  is  larger  thar 


the  Corresponding  key  of  v  rust  belong  to  its  right  sub 


iii)  if  a  descender 


of  v  happen: 


identical  value 


stores  in  its 
this  descer. 


s  i,.-th  key.  then  the  aeterrir.ati 


n  ^ ?. 


»  Ci  O  v  v/  <■« 


:t  should  be  placed  ir  Vs  left  as  opposed 
right  subtree  will  be  made  uper.  successively  eramir.irg  ~ 
the  (iv*l)-th,  ( iv*2 '  -  th  ar.i  other  successive  keys  until 
a  key  is  found  that  breaks  -re  tie 


WILL  A 


The  convention  which  the  previous  articles  /ne  75,  17'  17_ 
for  defining  their  discrininatcr  was  iv  =  Q depth  of 
The  sane  convention  for  determining  the  value  of  the 


one  v ) f 


aiscrm: 


vri.ll  be  used  here 


Four  different  balancing  criteria  for  k-d  trees  will  be 
compared  in 
are  given  below. 


a  "peer.  The  definitions  of  these  four  criter 


O  >w  c-  *, 


i)  A  k-d  tre a  vrill  be  snid  to  be  ideally 


laancea  a: 


co¬ 


exists  sons  integer  3  such,  that  all  leaves  in  the 


have  a  depth  ec  ual  to  exac tip  njj. 


ii)  A  k-d  tree  v/ith  height  h  will  be  said  to  .be  near-idea 


:ue 


’ee  nave  aertns  ec  un¬ 


balanced  iff  all  leaves  of 
either  h  or  h-1. 

iii)  A  k-d  tree  will  be  said  to  satisfy  the  AVI  balancing- 


criteria  iff  every  pai: 


20  ^  ^  ^  y  »**  r*  Vn 

.S«>  w  *  w 


rotners  in 

differ  cv  no  more  than  an 


the  tree  have  heights  tha 
integer  of  1. 

iv)  Let  v  denote  an  arbitrary  interior  node  of  a  k- 
IT  denote  v's  number  of  leaves  that  descend 


r\  ■*>*  Tr- 

/  , 


ITV^  denote  the  number  of  leaves  that  descend  from  v ' 


left  son,  and  n (v )  denote  the  ratio  of  IT„_/IT,r.  For 

v*.  ■  v 


any  fined  number  ol  ,  a  k-d  tree  will  be  said  to 
satisfy  the  33(c^)  bounded  balance  criteria  iff 
every  interior  node  of  this  tree  satisfies  the 
inequality  ^  p(v)  l-c<  . 


7/1  LIARD  -  5 


The  distinction  an one  the  above  four  classes  is  important 
because  the  theorems  of  Bentley,  Lee  ana  'Venn  were  technically 
proven  only  for  the  case  cf  ideal  h-d  trees,  and  because  these 
theorems  can  be  shown  to  specifically  not  hold  for  the  cases 
of  AVI  or  bounded  balance  k-d  trees  (as  will  be  seen  in  Theorem  1) 
The  next  three  paragraphs  will  give  a  more  detailed  summary  cf 
the  previous  work  of  these  authors,  since  .their  results  will  bo 
related  to  the  algorithms  proposed  in  the  later  sections  cf  this 


cater. 


The  first  theorem  about  the  worst -case  retrieval  times  of 


ideally  balanced  k-d  trees  was  discussed  in  Re  75.  In 


xi  uxtat  cater 


it  was  claimed  that  ideal  k-d  trees  always  enable 


rtial  match 


queries  (with  s  out  of  k  keys  specified)  to  be  performed  within 
C(n  '  *‘)  worst-case  time.  The  concepts  introduced  in  Rer.ticy '  s 
award -winning  article  were  undoubtedly  important,  but  the  article 
does  contain  one  minor  error  or  emission:  there  is  no  qualifying 
statement  indicating  that  the  normal  ")  "upper  bound"  on 

the  runtime  of  partial  match  queries  will  be  exceeded  by  certain 
unusual  k-d  trees  which  have  a  large  number  of  records  sharing 
the  same  value  in  one  of  their  Leys  and  which  also  have  a  dimen¬ 
sionality  of  at  least  3.  A  corrected  version  of  Bentley's 
theorem,  therefore,  would  contain  a  qualifying  statement 
indicating  that  the  0(D runtime  estimate  will  prevail 
as  a  strict  worst-case  upper  bound  for  ideally-balanced  k-d  cress 
provided  the  trees  in.  .question  have  been  confirmed  to  contain 
no  more  than  a  small  number  of  records  with  the  same  Rey-vs due. 


7/1  LIARD 


'The  need  for  an  article  similar  to  this  one,  which  discusses 
the  generalisation  of  k-d  trees  for  a  dynamic  environment ,  v;as 
recognised  in  Be  75.  -he  last  paragraph  of  that  article  cited 
the  example  cf  AVI  trees  and  indicated  that  it  would  be  desirable 
to  optimize  on  the  worst-case  performance  of  k-d  trees  in  a 
similar  manner.  The  discussion  in  this  paper  will  be  divided 
into  three  parts.  The  first  section  rail  demonstrate  that  the 
k-d  retrieval  theorems  of  Bentley,  Lee  and  7; eng  do  net  generalise 
for  the  specific  cases  of  AYL  or  bounded  balance  k-d  trees.  The 
second  section  will  define  a  nev;  data  structure  called  the  B'J) 
bound  forest  of  k-d*  trees,  and  will  intuitively .explain  how 
this  structure  is  more  suitable  fer  a  dynamic  environment.  The 
third  section  will  discuss  forests  of  k-d*  trees  in  mere  detail 
and  explain  their  many  desirable  characteristics. 


V 


WILIAED  -  3 

PAP.T  I 

It  has  been  shown  in  AVI-62  and  13-73  that  the  AVI  ana 
bounded  balance  criteria  are  very  useful  v:hen  manipulating 
one-dimensional  trees  in  a  dynamic  environment.  It  is  therefore 
natural  to  begin  the  study  of  higher-diner,  si  oral  trees  by  inquiring 
whether  analogous  results  hold  for  AVI  and  bounded  balance  k-d  trees. 
Theoren  1  of  this  paper  demonstrates  a  surprising  difficulty 


v/ith  AVI 


vs 


bounded  balance  k-d  trees:  their  retrieval  operations 


do  not  even  possess  the  sane  runtime  magnitude  as  that  associated 
with  ideal  trees.  Thus  the  theoren  shows  that  there  is  little 
analogy  between  one-  and  multi -dimensional  trees.-  Therefore,  an 
efficient  algorithm  for  inserting  and  deleting  records  in  AVI  and 
bounded  balance  k-d  trees  would  be  of  limited  usefulness  even  if 
it  could  be  designed .  A  second  difficulty  with  AVI  and  33 1  cA  ) 
k-d  trees  will  be  discussed  in  the  Appendix:  that  there  is  no 
known  technique  for  performing  efficient  worst-case  insertion  and 
deletion  operations  on  them. 

The  formal  statement  of  Theorem  1  is  given  in  the  next 
paragraph.  In  that  theorem,  as  well  as  elsewhere  in  this  paper, 
the  symbols  "q"  and  "p(q,k) "  will  be  used.  The  former  symbol 
will  denote  a  partial  match,  partial  region,  or  total  region 
query;  the  latter  symbol  will  denote  the  value  of  1-1/k  when  q 
is  a  partial  or  total  region  query,  and  1-s/k  when  c  is  a 
partial  match  query  with  s  leys  specified.  Thus,  this  notation 
will  have  designate  the  magnitude  of  worst-case  runtime 

that  the  previous  theorems  of  Bentley,  lee  and  V.'cr.g  attributed  to 
near-ideal  k-d  trees. 


J 


Theorem  1; 


ine  worst-case  retrieval  time  neeaea  ter  searcn 


AVL  and  bounded  balance  k-d  trees  will  be  consistently  -renter 
than  0(2.  *•  ■  )  tine.  A  more  specific  description  of  this 

ltmtine  can  be  giver,  if  the  symbols  C ( ^ ^  ^  )  and 
0(11-  )  are  used  to  denote  the  magnitudes  of  the 

worst-case  retrieval  tines  of  AVL  and  33(c<  )  k-d  trees.  Under 
these  circumstances,  the  following  tv;o  inequalities  will  held: 

(1)  P*  (q,  2c,  AVL)  >  p(q,  k) 

(2)  p*  (q,  k,  3E(cA  )  )  >  p(q,  k) 


Proof:  Similar  techniques  can  be  used  to  verify  the  the ere. 

for  any  query  q  and  for  any  integer  k>2.  Without  substantial 
loss  of  generality,  it  is  thus  sufficient  to  prove  the  theorem 
for  the  case  of  partial  natch  searches  that  specify  one  Ley  in 
2-d  trees. 

If  q  is  a  query  cf  this  type ,  then  Bentley' s  theorem  imp 1 
that  p(  o,2  )  =  •£-  To  prove  the  present  theorem,  r;e  must  ther 
sho'.v  that  more  than  2T^  tine  is  needed  to  search  worst-case  AVL 
BB(o(  )  trees.  V.:e  shall  do  this  by  constructing  a  sequence  cf 
trees  that  require  0(1.  0s*3“)  query  tine  despite  the  fact  that 
they  satisfy  the  AVL  and  33(1/3)  criteria. 

The  3ynbol  T(j)  will  denote  the  j-th  tree  in  this  sec.uerc 
Xf  t  nG  m 9  ~  err©  s*c  r2  (  3  )  t*s 0  v/z.XX  '0 0  cL  to  *1 0 


a  one-r.o^e  tree. 


rr  i 
*  -  o 


>0,  then  1(J)  will  be  inductively  defir. 


to  be  a  concatenation  of  three  T(j-l)  trees  in  the  manner  show. 


WILLARD 


Consider  a  partial  natch  request  that 
which  has  the  largest  possible  ZEY  1  value, 
of  induction,  it  can  be  verified  that  this 
require  0(2*^)  tine  (since  the  discriminator 
equal  1  and  2  respectively).  Furthermore,  ' 
seen  to  possess  enactly  3 ^  leaves  and  to  sa 
balancing  crieria.  ,  The  forced  conclusion  f: 
is  that  an  AYL  or  bounded  balance  2-d  tree  • 
require  C(i7“0s3  )  tine  to  perform  a  partial 
the  worst-case  anourt  of  run tine  needed  to  ; 
queries  in  A7L  and  bounded  balance  2-d  tree 
of  magnitude  the  corresponding  runtime  for 
reasoning  can  be  used  to  show  that  the  sane 
anv  other  ouer*.*  c  and  an*.*  other  dimension  I: 


seeks  the  record  in 
Using  the  p rarely 
search  request  will 
s  of  nodes  and  v 
T(j)  can  be  easily 

■X  -*  »“»  ^  —  •  X  V,  tin  ’  T  /  o  V".  17  ""t 

*,*,oxv'  wilS  ♦  *  /  —  2 _ CL  _ 

ron  these  cbservati 
•with  27  leaves  can 
natch  search.  Ihu 
perform  partial  mat 
s  exceeds  by  an  ord 
ideal  trees.  Aral a 
inequality  holds  f 


Y7X  Lj-A.-'jj 


Conaent  1.1:  The  appendix  at  the  end  of  this  article  contain 

a  list  of  the  various  values  that  p  *  ( q .  k ,  AVL )  and  p *  ( c  ,  , 35 ( O*  ) ) 
nay  assume  for  different  queries  and  dimensions.  The  appendix 
indicates  that  some  queries  have  p*  values  only  slightly  above 
their  p(a,k)  values,  whereas  other  queries  have  quite  large  p* 
values.  The  most  surprising  result  is  that  ?*(q,k, AVL)  frequent 
equals  one.  This  means  that  searches  through  AVL  k-d  trees  oft e 
consume  the  same  0(N)  retrieval  time  needed  by  an  exhaustive  sea 
through  an  unordered  list! 


Comment  1.2: 


The  appendix  also  contains  a  description  of  the 


expected  time  needed  to  query  randomly  constructed  k-d  trees. 
This  runtime  is  similar  to  that  of  AVL  and  bounded  balance  k-d 
trees  insofar  as  it  exceeds  the  magnitude  of  ideal  k-d  trees. 
Once  again,  therefore,  v/e  see  that  a  one-dimensional  tree  the  ore 
(which  stated  that  randomly  generated  trees  have  the  same  expect 
search  time  as  ideal  trees)  has  no  analog  for  higher-dimensioned 
k-d  trees. 


WILLASD  - 


PART  II 

LI  any  of  the  trees  in  the  remainder  of  this  paper  will  tech¬ 
nically  not  be  k-d  trees  but  will  rather  be  3 one thing  very  simil 
called  k-d*  trees.  To  understand  the  distinction  between  these 
concepts,  one  must  first  recall  that  the  definition  of  k-d  trees 
^3e  757  required  that  there  exist  a  one-to-one  correspondence 
between  the  nodes  of  the  tree  in  question  and  the  elements  of 
the  list  it  represents.  The  definition  of  k-d*  trees  v/ili 
differ  from  that  of  k-d  trees  by  having  this  pairing  exist 
only  between  leaves  of  the  k-d*  tree  and  the  elements  of  she 
corresponding  list. 

It  is  trivial  to  show  that  all  the  order  of  nagnitude 
retrieval  runtimes  previously  mentioned  in  connection  with 
k-d  trees  are  equally  valid  for  k-d*  trees.  The  main  run sine 
difference  between  these  trees  can  be  understood  once  record 
deletion  operations  are  considered.  Given  a  heighs  of  h,  it 
is  fairly  easy  to  show  that  )  and  0(h)  worst-case 

runtimes  respectively  are  needed  to  perform  record  deletion 
operations  in  k-d  and  k-d*  trees.  Deletion  operations  in  k-d* 
trees  are  thus  more  efficient  than  their  k-d  tree  counterparts. 
Consequently,  k-d*  trees  will  be  the  main  tree  structure 
employed  in  the  rest  of  this  paper. 

Let  f  denote  a  forest  of  k-d*  trees  whose  leaves  collective 


represent  a  list  of  IT  records,  vcr  fined  integer  0  ,  forest  f 


-r'*'  "O  A  ^  «*•  ^ 


will  be  said  tc  sasisf"  she  7(J)  c our dins  condition  if  for 
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whose  heirht  is  greater  than  h.  This  paper  will  shew  that  the  ?( J 
bound  forests  dc  fer  k-d  trees  what  the  AVI  and  33  (c<  )  conditicr.s 
did  for  traditional  one-dimensional  sorted  lists.  Che  discussion 
of  this  topic  will  commence  with  an  initial  theorem  about  wcrst-ca 
retrieval  tine,  and  will  subsequently  turn  to  record- insertion  and 
deletion  runtimes. 


Theorem  2 


4  partial  match,  partial  region,  or  total  region 


query  in  an  C'J)  bounded  forest  of  k-d*  trees  will  never  require 
more  than  0(3'^’^)  re  or i  eval  t  ime . 


Proof : 


'he  worst  possible  retrieval  runtime  for  an  ?(J)  bound 


forest  results  when  this  forest  consists  of  a  collection  of  Trees 
that  has  exactly  one  tree  of  height  equal  to  h  for  every  h  satisf ; 
0 <  h  (logg-’l  +  «T.  The  earlier  theorems  of  Bentley,  lee  and  .7 eng 
imply  that  r.c  more  than  C( 2“  -  >  ~  )  runtime  will  ever  be  needed 

to  perform  query  a  in  a  tree  of  height  h.  It  thus  follows  that 
the  maximum  time  needed  to  query  all  the  trees  in  the  ?( J)  bound 


flog  *] 


forest  will  be 


,h*p(c  ,k) 


This  sum  can  easily  be 


shown  to  be  less  than 


The  bracketed  tart  of  the  latter  extressicr.  ohoulc 


I  -  1  N 


cs  rersrueu  as 


a  runtime  coefficient  (since  it  dees  net  contain  the  variable 
Thus  it  has  been  established  that  all  queries  q  cm',  be  performed 

..r-?  ^  ^  f  *■**'  V  -i  i  —  M  t?  f  r  )  v  -v.  ^  <? 


Comment  2.1: 
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The  specific  value  of  the  runtime  coefficient  of 


F(J)  bound  forests  vri.ll  of  course  depend  on  the  value  of  «7.  In 
this  paper,  J  will  always  equal  1.  The  associated  coefficient 
will  be  quite  efficient,  and  it  vri.ll  exceed  the  coefficient  of 
near-ideal  k-d*  trees  by  a  factor  of  only  1 _  .  (The 

reason  why  the  preceding  sentence  used  near-ideal  rather  than 
ideal  k-d*  trees  as  a  basis  of  comparison  is  that  ideal  2:-d 

trees  are  impossible  to  construct  when  a  file  has  ocher  than 

iv  \ 

exactly  2  re coras. ; 


Comment  2.2: 


The  collective  implication  of  Theorems  1  and  2 


is  that  F(J)  bound  forests  of  k-d*  trees  are  mere  effrisrt  tka n  their  1' 
and ZB(d ;  ocunnrparts ,  and  that  these  forests  have  a  retrieval  time 
magnitude  which  equals  that  of  ideal  k-d  trees.  These  results 
are  surprising  because  they  indicate  that  k-d  trees  behave  in 
the  exact  opposite  manner  as  tree  representations  of  sorted  lists. 
For  sorted  lists,  an  F(J)  bound  forest  would  be  less  efficient 
than  an  AVL,  S3 {cA  )  or  ideal  tree  representation,  since  it  would 
require  C(log  1)  retrieval  tine.  Thus,  a  technique  which  was 
inherently  inefficient  in  traditional  tree  applications  will  be 
shown  in  this  paper  to  attain  the  optimal  runtime  magnitude 
in  the  context  of  a  multi -dimensional  dynamic  environment. 
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PART  III 

This  section  will  show  that  any  record  can  he  inserted 
deleted  iron-  a  "balanced  forest  of  k-d*  trees  in  Cdog^E'  ru 
The  discussion  here  will  be  divided  into  two  tarts  rsstecti 
examining  optimization  of  CERT  and  worst-case  measured  runt 
To  define  the  former  measurement  of  runtime,  we  require  tha 

i)  the  symbol  A  denote  an  algorithm  which  performs  ins 
and  deletion  operations  in  a  data  structure  denoted 

ii)  C  denote  a  sequence  of  insertion  and  deletion  comma, 
whose  length  is  denoted  as  |  C  | 

iii)  data  structure  D  rears sent  the  empty  set  of  records 
before  c cnmar.d  sequence  2  ms  executed 
Under  these  circumstances,  algorithm  A  will  be  said  tc  have 
CERT  (the  acronym  stands  for  "Conservative  Estimate  of  Eva: 
equal  to  R  iff  (  C  j  R  represents  the  maximum  amount  of  tine 
any  sequence  C  can  force  A  to  consume. 

The  usefulness  of  the  CERT  criteria  in  the  design  of 
algorithms  was  previously  demonstrated  in  »'!-?£ .  There  our 
goal  was  to  optimize  on  the  worst-case  performance  cf  a  com 
cated  data  structure  called  a  super-3-tree .  V.e  did  sc  by 
approaching  the  subject  matter  in  a  top down  manner,  success 
developing  more  sophisticated  algorithms  which  optimized  on 
expected ,  CERT,  and  worst-case  measured  runtime .  A  similar 
top down  method  cf  presentation  will  be  used  here  because  of 
it3  clarity,  as  well  as  its  potential  usefulness  in  a  large 
number  of  different  tyces  of  com cuter  anvil cations. 
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Our  algorithm  for  optimizing  CERT -measured  runtime  will  be 
called  ILSDEL-1.  Its  arguments  will  consist  of  an  F(l)  bound 
forest  of  k-d*  trees  (denoted  as  f)  and  a  command  to  either 
insert  or  delete  a  record  in  this  forest  (denoted  as  s).  The 
purpose  of  INSDEL-1  will  be  to  apply  command  s  to  forest  f  in 
a  manner  that  requires  log  IT  CERT  runtime  and  insures  that  the 
F(l)  bound  condition  will  continue  to  hold  after  a  is  executed. 
The  specific  procedure  used  by  IITSLSI-1  will  consist  of  the 


following  three  steps: 

1)  If  z  is  an  insertion  command,  then  add  a  new  one-node 
tree  to  forest  f  that  consists  of  the  designated  record 

2)  If  z  is  a  deletion  command,  and  if  r  denotes  the  record 
specified  by  z,  then  deallocate  the  memory  space  of  bet 
r  and  its  father  (the  latter  because  an  interior  node 


in  a  binary  k-d*  tree  has  no  purpose  when  it  possesses 
only  one  son).  Also  modify  the  information  in  r's 
grandfather  to  reflect  these  changes  (which  means 
that  its  previous  pointer  to  r"s  father  should  be 
changed  to  a  pointer  to  r's  brother). 

3)  Note  that  3teps  1  and  2  are  capable  of  causing  f  to 
violate  the  ?(1)  bound  condition.  The  purpose  of 
this  step  will  be  to  further  modify  f  to  insure  that 
the  ?(1)  balance  is  restored.  The  specific  procedure 


of  this  step  car.  be  best  explained  i 
largest  inoeger  such  that  moi-o  than 


f  h  denotes  the 

flog  :7|  +  i  -  > 


of  f's  trees  have  height  > 


and  if  L  denotes  the 


7/1 1  LA 


least  inte 

ger  >  h  such  that  all 

the 

trees  in  f  whose 

height  is 

less  than 

or  equal  to 

L  have  a  ccnbizied  t  O' 

of  no  more 

2L 

CL 

leafrecords. 

Chi 

s  step  will  take 

the  trees 

of  height 

—  L  and  comb 

ir.e 

p  r  prr  m  —  p  p  o  v”  y*' 

k-d*  tree 

of  height 

L  (using  2en 

tlsy 

' s  tree-building 

algo ri the) . 


Theorem  3: 


Let  IT  denote  the  maxi 


number  of  records  th 


appear  ir.  forest  i  curing  command  sequence  C.  The  above  I '.A 
procedure  v.dll  have  an  C(log2IT)  CZRT  measured  runtime . 


tr root : 


Step  1  of  IN3L21-1  can  easily  be  seen  to  consume 

an  C(log2::)  time 


*  - 


runtime,  and  step  2  will  consume  no  more 
(because  of  the  restriction  on  the  height  and  number 
allowed  in  an  ?(1)  forest).  Thus  only  the  runtime  of  step  3 
needs  to  be  verified  to  prove  the  theorem. 

Bentley’s  k-d  tree  construction  theorem  implies  that  an 
invocation  cf  step  3  will  consume  0(12^)  runtime  when  this  s 


constructs  a  tree  of  height  I.  furthermore . 
verified  that  there  will  exist  no  more  than 


it  can 

,-L+l 


c  -  V  p  o 


L2' 


i 


eacu 


:cca 


during  sequence  C  when  a  tree  of  height  L  is  built. 


hi 

The  com 


implication  of  these  ocser/aticr.s  is  that  step  3  cannot  spen 
more  than  2L  |  C 1  runtime  constructing  trees  whose  height  is 
exactly  equal  to  I. 

The  previous  paragraph,  together  with  the  fact  that  all 
trees  in  an  ?(1)  bound  forest  have  heigr.ts  leer  than  flog 


WILIAED  -  18 


implies  that  the  total  time  cons  true  ting  these  trees  must  be  less 


than 


g„:;  +  i 


2L  I  C  I 


1=1 


The  above  sun  has  an  0(  1  C  (  log  *T)  magnitude.  Dividing  this 
quantity  by  \c|,  we  obtain  the  result  that  step  3  has  an  C(log  .*) 


C2HT  measured  runtime . 


The  combined  implication  of  theorems  2  and  3  is  cf  course 
that  ?(1 )  bound  forests  of  k-d*  trees  have  efficient  retrieval, 
insertion  and  deletion  runtimes  when  judged  by  the  CERT  criteria. 
The  final  goal  of  this  paper  will  be  to  develop  a  still  more 
efficient  algorithm  which  optimises  on  worst-case  runtime.  • 

The  nature  of  the  task  ahead  of  us  can  be  understood  cr.ce 
it  is  noted  that  worst-case  runtime  optimisation  is  only  slightly 
more  difficult  than  CERT  optimization.  Thus  the  C(log^")  CZJ.T 
runtime  of  IIT3DEL-1  will  be  automatically  converted  into  a  strict 
worst-case  runtime  if  the  variance  in  runtimes  cf  the  commands 
of  sequence  C  is  simply  reduced. 

The  which  cerifoms  worst;— case  optimization  in 

this  paper  will  be  called  IITSDEL-2 ( of  ).  The  c£  parameter  of 
this  algorithm  will  designate  a  runtime  coefficient  v.hich  must 
be  greater  than  2.  IZ3DZL-2  will  differ  from  TI'SDE1-1  mainly 
with  respect  to  the  last  sentence  of  step  3.  The  distinction  is 
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that  the  latter  algorithm  will  initiate  an  evolutionary  process 
for  gradually  merging  several  old  trees  into  a  new  one  rather 
than  performing  this  nerger  operation  in  one  single  tire -censuring 
step.  If  SI  denotes  the  number  of  records  that  should  he  inserted 
into  the  new  merger  tree,  then  the  1175121-2 (c^  )  evolutionary 
process  will  he  designed  to  build  the  new  tree  in  piecereal 
fashion  during  the  next  I'./cA  insertion  and  deletion  c errands . 
Essentially  if  SI  log  SI  denotes  the  approximate  amount  of  work 
needed  to  build  the  whole  k-d*  tree,  then  this  evolutionary 
process  log  SI  runtime  building  the  new  tree 

during  each  insertion  and  deletion  command.  Such  techniques 
can  be  formally  proven  to  produce  an  II7S1E1-2  procedure  operating 

p 

in  O((*log  .7)  worst-case  runtime. 

The  previous  two  paragraphs  were  intended  to  intuitively 
introduce  the  1113121-2  procedure  by  explaining  its  relationship 
to  1:75121-1.  The  rest  of  this  paper  will  give  a  much  mere 
detailed  description  of  II7S12I-2  and  its  runtime  few  the  benefit 
of  those  who  wish  to  fully  understand  the  suo^ect  matter. 

The  data  structure  manipulated  by  1775121-2  will  consist 
of  a  forest  of  k-d*  trees  (denoted  as  f),  tege titer  with  a 
dictionary  (denoted  as  d).  Each  tree  XU  ^  Will  *tv/o 

flags  associated  with  it.  The  first  flag  will  indicate  whether 
the  k-d*  tree  is  "partially  constructed"  as  opposed  to  "complete!; 
constructed."  Tine  meaning  of  this  flag  can  be  understood  once 
it  is  recalled  that  the  previous  paragraphs  indicated  tit  at  1 77 3221 
would- gradually  construct  its  net/  trees  over  an  extended  oeried  c 
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In  this  content,  a  tree  will  be  said  to  be  in  a  "partially  construc- 
ted"  state  if  it  is  not  yet  fully  built,  and  "completely  constructed-' 
otherwise . 

The  second  flag  will  indicate  when  a  k-d*  tree  is  planned 
to  be  removed  from  the  forest  of  k-d*  trees.  If  II7SDZL-2  is 
currently  building  a  new  k-d*  tree  which  is  the  merger  of  several 
old  trees,  then  this  flag  will  indidate  that  the  older  k-d*  trees 
are  in  an  "aging"  state  (these  aging  trees  will  be  removed  from  f 
as  socn  as  construction  of  the  merger  tree  is  completed).  If  a 
k-d*  tree  is  not  "aging,"  it  will  be  said  to  be  "young." 

The  symbols  of  f  ,  f  ,  f  and  f,.  will  be  used  in  this  paper 
to  denote  f's  respective  subsets  of  "partially  constructed," 
"completely  constructed,"  "aging,"  and  "young"  k-d*  trees. 

Also,  symbols  such  as  f^  will  denote  the  intersection  of 
the  f .  and  f  .  subforests. 

X  J 

In  addition  to  forest  f,  the  IITSDE1-2  procedure  will 
require  a  dictionary  d.  Given  all  k  He vs  of  a  specified  record, 
this  dictionary  will  enable  IITSDEI-2  to  locate  the  record's 
position  in  f  in  0(log  N)  worst-case  time.  The  dictionary 
will  also  support  0(log  N)  worst-case  record  insertion  and 
deletion  operation.  Dictionary  d  is  included  in  I1SDZI-2 ' s 
data  structure  because  the  algorithm's  second  step  will  require 
it.  Dictionary  d  can  easily  be  implemented  by  using  a  3- :ree 
with  concatenated  Keys  similar  to  that  of  lum-70. 


In  our  formal  description  of 
assumed  that  z  denotes  an  ir.serti 


i:tsdzi-2(o(  ) , 
on  or  deletion 


it  will  be 
command.  The 


I 


! 
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U7SDZL-2  (c*  )  procedure  will  have  five  steps  —  the  first  three  of 
which  are  either  identical  or  similar  to  their  IhSLEl-1  counter¬ 
parts.  These  five  steps  are  described  below: 

1)  If  z  is  an  insertion  command,  then  add  a  new  ore-record 
tree  to  the  forest  that  consists  of  the  designated 

cJf 

record. 

2)  If  z  is  a  deletion  command,  then  use  dictionary  d  to 
locate  the  trees  in  f  that  contain  this  record,  folete 
these  records  using  a  procedure  similar  to  s -ep  2  of 
IiTSDEL-1. 

3)  Let  h  denote  the  largest  integer  such  that  more  than 
flog  N|  +  1  -  h  of  the  trees  in  f,_  have  height  >  h, 

J 

and  let  L  denote  the  least  integer  ml  h  such  that  ail 
the  trees  in  f  whose  height  is  less  than  or  equal  to  L 
have  no  more  than  2^  leaf records .  This  step  will  order 
the  ini  tiat  z.  cn  of  the  evolutl on ary  process  (described 
in  step  4)  that  will  build  a  new  "merger"  tree  out  of 
the  leaf records  of  those  trees  in  f  which  (at  the 
time  this  process  was  initiated)  had  a  height  I. 

(The  change  of  flags  accompanying  this  step  will  of 
course  move  the  affected  trees  from  f  to  f „ . } 

j  a 

4)  Let  us  recall  that  Bentley  has  shown  that  I.I  log  11 
denotes  the  amount  of  runtime  needed  to  build  an 
E-membered  k-d*  tree.  This  step  will  spend  oL  log 
units  of  runtime  on  each  tree  in  to  continue  the 

w 
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these  trees.  (If  this  step  corbie tes  the  process  of 
constructing  a  full  new  tree,  it  will  instruct  the 
garbage  collector  to  deallocate  the  memory  space  of 
the  associated  "aging"  trees  whose  records  have  beer- 
incorporated  into  the  new  tree.) 

5)  The  last  step  will  update  dictionary  d  so  that  it  also 
reflects  the  record  insertion  or  deletion  command  which 
was  indicated  by  z. 

Theorem  4:  If  parameter  is  chosen  to  be  greater  than  2, 

then  the  I'TSDEI-2(c^  )  procedure  vri.ll 

i)  possess  an  O(log^N)  worst-case  record  insertion  and 
deletion  runtime 

ii)  insure  that  the  forest  f  will  occupy  no  more  than  C(I7) 

space 

iii)  also  assure  that  forest  f  consistently  enables  partial 
natch,  partial  region,  and  total  region  queries  to  be 
performed  in  0 ( IF3^ °  ’ ^ ^ )  worst-case  retrieval  time 

Proof  of  (i);  It  is  absolutely  trivial  to  show  that  steps 
1,  2,  3  and  5  can  be  executed  in  C(lcg  II)  runtime.  Thus  only 
step  4  remains  to  be  considered.  If  [f^{  denotes  the  number 
of  trees  in  forest  f  _ ,  then  this  stec  must  consume  no  more  than 

w  * 

Js  1  f „  I  log  IT  runtime.  Furthermore,  Jf_J  car.  be  shewn  to  be  alway 
strictly  less  than  log  II  +  2.  Step  4  can  thus  consume  no  mere 
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Proof  of  (ii ) 


Every  record  in  our  file  can  be  shown  to  have 


its  name  appear  once  in  dictionary  d,  once  in  the 


;-,70  additional  sntries  in  the  f  f erect . 

3? 


to  have  no  more  tha 
This  inplies  that  a  total  of  no  more  than  appro;d.mately  41! 
units  of  memor y  space  is  needed  by  the  I-TSEE1-2  data  structure . 

QED 


Prcof  of  ( iiii : 


Consider  a  retrieval  algorithm  which  emclc's 


the  procedures  of  Pentley,  Lee  and  V/or.g  to  search  all  the  k-d“ 

trees  in  the  f  forest  whenever  the  user  gives  a  query  request. 

Theorem  2  inplies  that  such  an  algorithm  would  have  an  0(1^^*“' ' 

retrieval  tine  (because  f„  is  an  ?(1 )  bound  forest).  Unf or tuna tel” . 

^  c  7 

we  cannot  use  this  search  algorithm  in  the  content  of  the  I1S1I1-2 

update  procedure  because  that  procedure  does  not  guarantee  that 

all  of  -y's  trees  will  be  fully  constructed.  Instead  of  searching 

f  ,  we  must  search  the  f  forest  (all  of  whose  trees  are  fully 

constructed ) .  The  difference  between  f^  and  f._  can  be  shown 

J  1 

to  increase  retrieval  tine  by  a  factor  of  no  more  than  y  _  T "c<  * 
This  quantity  should  be  regarded  as  a  coefficient,  since 
Ck' s  value  is  independent  of  1.  Hence  f  forests  possess  the 
same  C(1I- )  retrieval  time  as  f_  forests. 


Comment  i.l; 
unspecified  ir. 
because  larger 


V.’e  deliberately  left 
the  above  description 
values  of  o(  will  sake 


the  value  of  the  parameter  cx 
cf  the  I!SPZL-2(o(  )  procedure 
the  coefficient  of  retrieval 
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time  improve  at  the  expense  of  a  less  efficient  worst-case  insertion 
and  deletion  coefficient,  The  optimal  choice  of  coefficients  will 
thus  depend  on  the  requirements  of  the  specific  applications.  It 
should  also  be  stated  that  IirSEEL-2  (+  oO  )  =  IITSDSL-1. 

Comment  4.2:  The  discussion  in  this  paper  v/as  intended  to  be 

primarily  theoretical,  and  deliberately  avoided  such  issues  as 
paging  and  the  fact  that  computers  are  typically  much  busier 
during  some  periods  than  during  others.  It  is  fairly  easy  to 
revise  the  proposed  algorithms  to  take  these  additional 
considerations  into  account.  ?or  instance,  nearby  nodes 
in  the  k-d*  trees  should  be  stored,  as  often  as  possible, 
on  the  same  page.  Furthermore,  step  4  of  II7SDEI-2  should  be 
treated  as  a  background  process  whose  execution  is  typically 
deferred  until  periods  when  the  computer  would  otherwise  be  idle. 


V/ILLARB 


CONCLUSIC!! 

In  this  article,  an  algorithm  was  first  developed  that 
optimized  on  C2H1  runtime ,  and  it  was  subsequently  improved 
to  optimize  on  worst-case  runtime.  A  sinilar  topdov/n  nethcd 
was  used  by  us  previously  to  design  the  super-B-tree  algorithm 
(Wi  78,  Wi  79).  Rivest's  conjecture  implies  that  our  k-d* 
forest  algorithm  has  the  best  possible  retrieval  tir.e  for  a 
data  structure  which  occupies  0(11)  space.  In  all  likelihood, 
other  efficient  worst-case  algorithms  can  also  be  developed 
by  first  considering  C2RT  runtime  and  then  adding  the  further 
models  needed  for  worst-case  optimization.  V/e  strongly 
recommend  this  approach  in  topaown  design  of  algorithms. 
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APPENDIX 

la  Theorem  1,  it  was  indicated  that  AVL,  33  (o^  )  and  rand cnly- 
constructed  k-d  trees  would  consistently  possess  runtime  magni¬ 
tudes  larger  than  that  of  their  ideal  tree  counterparts.  The 
purpose  of  this  appendix  is  to  list  the  specific  runtimes 
associated  with  these  trees.  The  first  three  listings  assume 
that  the  user  has  made  a  partial  natch  query  that  has  specified 
s  out  of  the  possible  k  Keys.  The  runtime  for  performing  this 


partial  natch  query  will  have  an  IT-  magnitude  where  p  equals 

i)  - —  ~--s.  ■■ — n - T-i r  for  the  case  of  worst-case  33 

k  -  s  -  s  j.o~2  ^1  ~o^  / 

trees 


ii)  the  minimum  of 


k  -  s 

-Ogo  ( 1+^ 53  J  —  l"1 


and  one  for 


the  case  of  worst-case  AVL  trees 
iii)  the  solution  of  the  equation  2^  =  (l+p)k”s  (2+p)s 
for  the  case  of  a  randomly  generated  tree 
The  runtime  magnitude  for  a  region  or  partial  region  query  in 
any  of  these  k-d  trees  will  equal  the  partial  match  runtime 
that  results  when  one  Key  is  specified  (although  the  coefficient 
will  be  larger). 

It  is  also  instructive  to  inquire  whether  or  not  techniques 
exist  for  executing  efficient  insertion  and  deletion  operations 
on  AVL  and  bounded  balance  k-d  trees.  '.7e  conjecture  that  there 
is  no  reasonable  method  for  optimizing  either  the  333T  or  worst- 
case  runtime  for  AVL  k-d  trees,  mounded  balance  k-d  trees  are 
much  easier  to  manipulate:  they  can  be  assigned  insertion  and 
deletion  algorithms  which  operate  in  either  O(log-::'  CLP.T  or 
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worst-case  runtime.  The  latter  procedure  does  have  one  serious 
drawback:  it  censuses  21  log  IT  units  of  additional  memory  space. 
This  allocation  of  memory  space  compares  unfavorably  with  the 
last  section's  k-d  fores  S  cLTlCi  is  a  serious  disadvantage  because 
one  of  the  purposes  of  k-d  trees  was  to  conserve  memory.  Thus 
we  see  that  balanced  k-d  forests  are  more  efficient  than  AYL 
and  bounded  balance  k— d  trees  from  both  the  perspectives  of 
retrieval  and  update  operations. 


I 


t 
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